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Abstract — We  show  how  to  solve  two  problems  of  optimal 
linear  estimation  from  a  finite  set  of  phase  data.  Clock  noise  is 
modeled  as  a  stochastic  process  with  stationary  c/th  increments. 
The  covariance  properties  of  such  a  process  are  contained  in 
the  generalized  autocovariance  function  (GACV).  We  set  up  two 
principles  for  optimal  estimation;  these  principles  lead  to  a  set  of 
linear  equations  for  the  regression  coefficients  and  some  auxiliary 
parameters.  The  mean  square  errors  of  the  estimators  are  easily 
calculated.  The  method  can  be  used  to  check  the  results  of  other 
methods  and  to  find  good  suboptimal  estimators  based  on  a  small 
subset  of  the  available  data. 

I.  Introduction 

Suppose  that  the  phase  residual  x  (t)  of  a  clock  (or  dif¬ 
ference  of  clocks)  is  given  at  a  finite  set  of  times,  T  = 
which  don’t  have  to  be  equally  spaced.  We 
consider  two  estimation  targets:  1)  the  phase  value  x  (i* ) 
at  some  t*  ^  T;  2)  the  overall  “trend”  coefficient  of  x  ( t ). 
Depending  on  the  model  for  x  ( t )  and  what  else  is  known,  this 
can  mean  the  long-term  average  value  of  phase  or  of  its  dth 
time  derivative:  frequency,  drift  rate,  or  aging  rate  (d  =  1,2, 
or  3).  For  each  target,  we  want  to  calculate  the  linear  estimator 
Y^i= i  aix  (U)  that  is  optimal  in  the  mean-square  sense  while 
satisfying  an  invariance  condition  that  will  be  explained  below. 

To  carry  out  this  program,  we  need  a  stochastic  model  for 
x  ( t ).  The  general  model  used  here  is  the  class  of  stochastic 
processes  with  stationary  dth  increments,  the  subject  of  a 
monograph  of  Yaglom  [1],  These  include  stationary  processes, 
indefinite  integrals  of  stationary  processes,  and  all  the  power- 
law  processes  familiar  to  the  time  and  frequency  field:  flicker 
PM,  white  FM,  flicker  FM,  and  so  on.  Yaglom  showed  how 
to  solve  problems  of  optimal  prediction  and  filtering  for  these 
processes  from  their  values  on  unbounded  or  bounded  time 
intervals.  For  pure  power-law  processes,  other  authors  have 
derived  the  mean  square  error  (MSE)  of  the  predictor  that 
is  based  on  the  infinite  past  [2]— [4] ;  Boulanger  and  Douglas 
[5]  calculated  the  MSE  of  two-point  linear  extrapolation. 
Vernotte  et  al.  [6]  calculated  predictors  and  their  MSEs  based 
on  extrapolation  of  least-squares  linear  and  quadratic  fits  of 
equally  spaced  finite  data  sets.  For  finite-state  clock  models,  a 
recursive  optimal  predictor  and  its  MSE,  based  on  all  discrete¬ 
time  past  measurements,  can  be  calculated  from  a  Kalman 
filter  [7].  The  MSE  of  various  suboptimal  drift-rate  estimators 
has  been  calculated  for  power-law  noises  [8] — [1 1].  Elere  we 
show  how  to  calculate  the  regression  coefficients  a*  and  the 
MSE  of  the  optimal  linear  estimators  of  both  targets,  using 


systems  of  linear  equations  that  generalize  the  equations  of 
orthogonal  projection  for  stationary  x  ( t ). 


II.  Clock  noise  models 

A  real-valued,  mean-square  continuous1  stochastic  process 
x  ( t )  is  said  to  have  stationary  dth  increments  ( d  >  1)  if  for 
each  r  the  process 

Atx{t)  =  (f)  (-i)fca;(t-  hr)  (1) 

k—0  '  ' 

is  stationary.  It  is  convenient  to  let  SI  (d)  denote  the  class 
of  all  such  processes,  and  to  let  SI  (0)  denote  the  stationary 
processes.  Then  SI  (d)  C  SI  (d  +  1),  and  we  define  the  degree 
of  x  ( t )  as  the  least  d  such  that  x  G  SI  (d). 

Everything  we  know  about  a  process  a;  €  SI  (d)  is  wrapped 
up  in  the  dth  increments  (1),  which  do  not  change  if  we  add  a 
polynomial  of  degree  <  d—  1  to  x  (t).  In  this  sense,  a  process 
in  SI  (d)  is  ambiguous.  Any  use  of  these  processes  must  take 
account  of  this  ambiguity. 

For  any  x  e  SI(d),  Yaglom  established  a  nonnegative 
spectral  density  function2  Sx  (/)  that  extends  the  notion  of 
spectral  density  for  stationary  processes.  Here,  we  are  using 
the  two-sided,  even  version:  Sx  (— f)  =  Sx  (/).  For  d  >  1, 
Sx  (/)  can  diverge  as  /  — ►  0  but  obeys  the  restrictions 


f2dSx(f)df<  oo, 


Sx  (/)  df  <  oo. 


(2) 


The  process  also  has  an  average  trend  coefficient,  denoted 
by  Cd,  that  can  be  defined  as  the  infinite-time  average  of  the 
stationary  process  Afx  (t): 


Cd 


1 

°o  ^2  —  ti 


Afx  ( t )  dt, 


the  limit  being  taken  in  the  mean-square  sense.  In  this  treat¬ 
ment,  Cd  can  be  a  random  variable.  We  often  want  to  get  rid  of 
Cd,  and  there  are  two  ways  to  do  it.  First,  if  we  know  Cd,  then 
we  can  consider  the  process  xq  (t)  =  x  (t)  —  Cdtd/d\,  which  is 
also  in  SI  (d)  but  has  trend  coefficient  zero.  Second,  if  we  don’t 
know  Cd,  then  we  can  treat  x  ( t )  as  a  member  of  SI  {d  +  1); 
as  such,  its  trend  coefficient  Cd+ i  is  always  zero.  For  example, 
a  stationary  process  with  an  unknown  mean  can  be  treated  as 


'This  means  that  E  [x  (u)  —  x  (t)]2  — >  0  as  u  — >  t. 
2Actually,  a  measure  on  the  punctured  frequency  axis  /  yt  0. 
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a  member  of  SI  (1).  If  a  random  walk  of  phase  (white  FM), 
which  is  in  SI(1),  has  an  unknown  slope  (frequency)  added 
to  it,  we  can  treat  it  as  a  member  of  SI  (2).  In  this  way  we 
can  get  results  that  are  invariant  to  the  unknown  trend. 

Yaglom’s  theory  of  SI  (d)  was  based  on  the  spectral  density. 
For  stationary  processes,  we  also  have  the  autocovariance 
(ACV)  function 

/OO 

ei27r StSx{f)df ,  (3) 

-OO 

which  satisfies 


sx  ( t  —  u)  =  E  x  (f )  x  ( u ) .  (4) 

The  estimation  methods  given  here  for  SI  ( d )  are  based  on 
a  generalized  ACV  (GACV)  function  [12][13],  also  written 
as  sx  (Z),  that  can  be  obtained  from  Sx  (/)  and  Cd  by  a 
generalized  Fourier  integral  as  follows: 


Sa;  ( t )  = 


2d-l 


J2nft 


-E 


k= 0 


{i2Trft)k 

k\ 


Sx  (/)  df 


+  [  ei2^Sx  (/)  df  +  (E  c2d)  LfL.  (5) 


One  may  add  a  polynomial  of  degree  <  2d—  1  to  sx  (Z)  without 
changing  the  value  of  any  formula  in  which  sx  ( t )  is  properly 
used.  With  this  ambiguity  understood,  Table  I  gives  the  GACV 
for  power-law  components  of  clock  noise,  specified  by  the  one¬ 
sided  spectral  density  of  frequency,  S+  (/)  =  2  (27 r/)2  Sx  (/). 
For  any  stationary  phase  component,  the  GACV  is  the  ACV. 
The  flicker  PM  entry  is  obtained  by  passing  pure  1//  noise 
through  a  moving-average  filter  of  width  r  to  satisfy  the 
second  condition  in  (2).  The  trend  coefficients  are  zero.  The 
degree  of  a  sum  of  noises  is  the  maximum  degree  of  the 
summands,  and  the  GACV  of  the  sum  of  orthogonal  noises  is 
the  sum  of  the  GACVs.  By  (5)  we  may  and  will  assume  that 
the  GACV  is  an  even  function:  sx  (— t)  =  sx  ( Z ). 


TABLE  I 

Generalized  autocovariance  of  power-law  noises 


we  can  freely  perform  and  combine  sums  over  t  without 
worrying  about  the  range  of  the  summations.  Such  a  function 
a  (t)  is  said  to  satisfy  the  moment  condition  of  order  d  [6]  [14] 
if 

E  a  (Z)  tJ  =  0,  j  =  0,...,d-l.  (6) 

t 

In  other  words,  if  x  (Z)  in  J2t  a  (l)  x  (0  *s  replaced  by  a 
polynomial  p  ( Z )  of  degree  <  d  —  1,  the  result  is  zero.  The 
coefficients  of  the  rZth  increment  (1)  satisfy  this  condition.  All 
the  covariances  needed  for  the  estimation  problems  can  be 
calculated  by  the  following  theorem. 

Theorem  1:  Let  sx(t)  be  the  GACV  of  x(t),  a  process 
with  stationary  dth  increments,  where  d  >  1.  If  a  (Z)  and  b  ( t ) 
satisfy  the  moment  condition  of  order  d,  then 

e (e a (*) x w j  (e b (*) x (*) j 

=  y 'g(t)b  ( u )  sx  ( t-u ) .  (7) 

t,U 

For  stationary  processes,  (7)  follows  from  (4),  and  no 
moment  conditions  are  needed.  For  d  >  1,  we  are  only  allowed 
to  take  the  covariance  of  linear  combinations  of  x  ( t )  whose 
coefficients  satisfy  the  moment  condition.  According  to  the 
theorem,  we  may  do  so  as  if  (4)  were  true.  In  reality,  sx  ( t ) 
is  not  an  ACV,  and  (4)  does  not  hold.  Nevertheless,  the  entire 
formula  (7)  is  correct,  even  though  the  corresponding  terms 
of  the  expansions  of  its  left  and  right  sides  are  not  equal. 

We  define  Mti  (x)  as  the  set  of  random  variables 
J2t  a  (t)  x  (t)  whose  coefficients  a  ( t )  satisfy  the  moment 
condition  of  order  d.  Then  Aid  (x)  is  a  linear  subspace,  and 
Theorem  1  tells  us  how  to  calculate  the  covariance  of  two 
members  of  Md{ x).  It  can  be  shown  that  any  member  of 
Md  (x)  is  a  mean-square  limit  of  linear  combinations  of  dth 
increments  of  x  (t);  for  this  reason,  if  Cd  =  0  then  the  members 
of  Md  (x)  have  zero  expectation.  We  also  define  Aid  ( x ,  T)  as 
those  members  of  Md  (x)  whose  coefficients  are  supported  on 
the  finite  set  T. 


Name 

degree 

s£  (/) 

Sx  (t) 

Flicker  PM 

(lowpass) 

I 

h,  fBin2(irf-r) 

^(|-lnr),  t  =  0 

White  FM 

1 

/in 

ho  |t| 

Flicker  FM 

2 

1 

>  ^ 

1 

4 

h-\t2  In  |t| 

2 

Random  walk  FM 

2 

1 

to 

1 

to 

h-2TT 2  |t|3 

Flicker  walk  FM 

3 

h-3f~3 

6 

/i-37r2t4  In  \t\ 

6 

Random  run  FM 

3 

1 

1 

/l_47T4  |t|5 

30 

Now  we  have  to  say  how  the  GACV  is  used.  In  the  following 
discussion,  a  (t)  and  b  ( t )  denote  functions  that  are  zero  except 
on  some  unspecified  finite  set  of  times;  by  using  this  notation 


III.  Clock  prediction 

Let  x  (t)  have  stationary  dth  increments.  The  estimation 
target  is  x(tf),  and  the  estimators  are  of  form 

x  (t*)  =  '22  a  (t)  x  ( t ) ,  (8) 

t£T 

where  T  =  {G,  ■  •  • ,  tn}  with  n  >  d.  (Although  this  problem 
is  called  “prediction”,  there  is  no  need  to  insist  that  t*  > 
maxtj.).  We  now  have  to  make  the  problem  invariant  to  the 
ambiguity  of  x  (t)  with  respect  to  polynomials  p  (t)  of  degree 
<  d—  1.  We  do  so  by  insisting  that  the  estimation  error  x(tfj  — 
x  (Z*)  should  not  change  if  x  (t)  is  replaced  by  x(t)  +p  (t). 
Thus  the  expression 

x  (z*) + p  (u)  -  22 a  W  \x  (t) + P  Ml 

teT 
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should  not  depend  on  p(t).  Consequently, 

^2a(t)p{t)=p{U)  (9) 

ter 

for  any  polynomial  p  ( t )  of  degree  <  d  —  1;  the  estimator 
predicts  p(tf)  perfectly  from  the  values  p(t),t  E  T.  For 
d  =  2,  if  we  add  a  constant  phase  and  frequency  to  x  (i), 
the  estimate  x(i*)  should  automatically  adjust  itself  so  that 
the  error  does  not  change.  An  equivalent  statement  of  this 
invariance  condition  is  that 


X  (t*)  —  X  (f* 

)  e  Md  (x) . 

(10) 

We  may  also  express  (9)  as 

5 Za(t)tJ  =  ti, 

7  =  0,.. 

. ,  d  —  1. 

(ID 

teT 

In  matrix  form,  Ga  = 

g,  where 

a=  [ 

a(ti)  ■ 

n  T 

a  ( tn )  ? 

(12) 

1 

i 

G  = 

ti 

tn 

5 

(13) 

j.d-1 

fd—1 
Ln  _ 

g=[ i  U  ■ 

•  •  tt1] 

A 

(14) 

A  random  variable  x(i*)  of  form  (8)  that  satisfies  the 
invariance  condition  is  called  a  linear  invariant  estimator 
(LIE)  of  order  d  of  x(i*)  based  on  x(t),t  €  T.  Since 
n>  d,  a  LIE  of  order  d  exists,  namely,  the  value  at  f*  of  the 
interpolating  polynomial  determined  by  x  (fy) , . . . ,  x  (to).  For 
d  =  2,  this  LIE  is  the  linear  extrapolator  of  x  (t±)  and  x  (tf) 
to  i*,  which  was  treated  by  Boulanger  and  Douglas  [5].  Some 
of  the  GACVs  of  Table  I  appear  in  their  formulas.  Supposing 
that  ti  <  f2  <  t*,  one  can  often  optimize  t2  —  t\  to  give 
an  MSE  that  is  acceptably  close  to  the  minimum  MSE  that 
one  can  get  from  LIEs  based  on  the  infinite  past  t  <  t2  [4]. 
The  linear  and  quadratic  extrapolators  discussed  by  Vemotte 
et  al.  [6]  are  LIEs  of  order  2  and  3,  respectively.  The  ACVs 
tabulated  in  [6]  are  related  to  the  GACVs  used  here. 

Of  all  the  order-d  LIEs  of  x  (£*),  we  want  the  one  with  the 
smallest  MSE,  called  the  best  LIE  (BLIE)  of  order  d.  The  set 
of  LIEs,  call  it  {LIE},  is  determined  by  the  inhomogeneous 
equations  (11)  for  a(t).  The  difference  of  any  two  LIEs  is 
a  random  variable  J2teT  &  (0  x  W  whose  coefficients  satisfy 
the  corresponding  homogenous  equations,  that  is,  it  belongs 
to  the  linear  subspace  Md(x,  T).  Thus,  {LIE}  is  an  affine 
set,  a  shifted  version  of  Md  (x,  T)  that  does  not  pass  through 
the  origin.  To  find  the  closest  point  of  {LIE}  to  x  (t*)  in  the 
mean-square  sense,  we  drop  a  perpendicular  from  x(i*)  to 
{LIE}.  Thus,  x(t*)  has  to  be  a  member  of  {LIE}  and  also 
has  to  satisfy  the  orthogonality  condition 

x  (f*)  -  x  (i*)  _L  Md(x,T),  (15) 

which  means  that 

E  [x  (f*)  —  x  (f*)]  Y  =  0  whenever  Y  E  Md  (x,  T) .  (16) 


Figure  1  should  make  the  geometry  clear.  It  can  be  proved 
from  basic  facts  about  orthogonal  projections  that  the  BLIE 
x(fy)  exists  and  is  unique. 


Fig.  1 .  Geometric  interpretation  of  the  conditions  for  a  BLIE 

Let 

x  (T)  =  [x  (ti)  ■■■  x  (£„)] T  ,  b  =  [&1  •  •  •  bn\ T  . 

(17) 

Then  (16)  can  be  rewritten  as 

E  [x  (f*)  —  aTx  (T)]  |x  (T)t  b\  =  0  whenever  Gb  =  0. 

(18) 

Both  factors  on  the  left  side  of  (18)  belong  to  Afy  (x);  by 
Theorem  1,  the  expectation  can  be  evaluated  as  if  (4)  were 
true.  Doing  so  gives  the  condition 

(rT  —  aTR )  6  =  0  whenever  Gb  =  0,  (19) 

where  r  is  a  column  vector  and  R  a  symmetric  n  x  n  matrix 
formed  from  the  GACV  of  x  ( t ): 

r  —  (t\  tfj  •  Sx  ifn  fy)]  5  (20) 

R  =  [sx  (ti  -  tj)  :  i,  j  =  1, . .  .,n] .  (21) 

In  turn,  condition  (19)  says  that  the  row  vector  ?,T  —  aT  R  is 
orthogonal  to  all  vectors  b  that  are  orthogonal  to  the  rows  of 
G.  Therefore,  rT  —  aTR  belongs  to  the  row  space  of  G,  that 
is,  r  —  Ra  =  GT9  for  some  d-vector  9  =  \60  ■  ■  ■  9d- i]  . 

We  now  have  the  system  of  equations 

Ga  =  g ,  Ra  +  GT9  =  r,  (22) 

which  constitute  n  +  d  equations  in  the  n  +  d  unknowns  a,  9. 
They  generalize  the  Yule-Walker  equations 

Ra  =  r  (23) 

for  stationary  x  ( t );  in  that  case,  R  is  a  genuine  covariance 
matrix,  and  x  (t*)  is  the  orthogonal  projection  of  x  (f*)  on  the 
unrestricted  subspace  generated  by  x(t)  ,t  E  T.  For  d  >  1, 
R  is  symmetric  if  sx  ( t )  is  even,  but  usually  has  both  positive 
and  negative  eigenvalues. 

After  solving  (22)  for  a  and  9,  we  can  calculate  the  mean 
square  error  of  x  (t*)  as  follows: 

MSE  =  E[x(fy) -x(fy)]2 

=  E  [x  (fy)  —  aTx  (T)]  [x  (t*)  —  x  (T)T  aj 

=  $x  (0)  —  rTa  —  aT  (r  —  Ra) 
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(24) 


by  Theorem  1.  Then,  by  (22), 

MSE  =  ss  (0)  -  rTa  -  gTd. 


We  mention  three  methods  for  solving  (22).  First,  if  n  =  d, 
then  G  is  nonsingular,  and  we  can  solve  (22)  for  a  and  then  for 
6.  Second,  if  R  is  nonsingular3,  then  we  can  write  a  solution 
in  the  following  form  (f?-1  method): 

A  =  GR~1Gt , 

9  =  A -1  (GR-'r  -  g)  , 
a  =  R-1  ( r  -  Gt9)  . 

Third,  we  can  set  up  the  system  (22)  as  the  single  matrix 
equation 


'R 

GT' 

a 

r 

G 

0 

0 

9_ 

and  tell  Matlab4  to  solve  it  in  one  operation  (“brute  force” 
method).  Although  Matlab  often  says  that  the  big  matrix  is 
badly  scaled  or  nearly  singular,  the  solution  seems  to  work 
anyway  in  the  cases  the  author  has  tried. 

Before  carrying  out  this  solution,  we  should  reduce  the 
trend  coefficient  to  zero  as  explained  earlier,  either  subtracting 
the  trend  from  x  ( t )  if  Cd  is  known,  or  increasing  d  by  1  if 
Cd  is  unknown.  Then  x  (f» )  is  unbiased  for  x  (t* )  because 
x  ( t *)  —  x  (i*)  €  Md+i  ( x ),  all  of  whose  members  have  zero 
expectation.  The  penalty  for  increasing  d  is  a  greater  MSE 
for  the  BLIE,  because  the  set  of  LIEs  shrinks  as  the  order 
increases. 


A.  Examples 

1.  Model:  white  FM,  h0  =  1,  d  =  1,  average  frequency 

ci  =  0,  T  =  {0,  — 1, ...,  — 10},  =  5.  We  get  the  expected 

result:  x  (5)  =  x  (0),  MSE  =  4/iot*  =  2.5. 

2.  The  same  example,  but  with  an  unknown  frequency 
(phase  slope)  added.  We  calculate  the  BLIE  of  order  2  to  get 
a  predictor  that  is  invariant  to  the  added  frequency.  The  result 
is  the  linear  extrapolator  a:  (5)  =  §x  (0)  —  |x(— 10),  with 
MSE  =  3.75.  This  is  a  simple  demonstration  of  the  penalty 
paid  for  a  lack  of  knowledge  of  the  trend.  But,  as  T  reaches 
farther  and  farther  into  the  past,  the  estimator  recovers  the 
unknown  frequency,  and  the  MSE  tends  to  4 hoi*. 

3.  Model:  white  FM  (ho  =  2)  +  random  walk  FM  (h_2  = 
2.53  x  10-5),  d  =  2.  Figure  2  shows  normalized  rms  prediction 
error,  VMSE/r,  of  x  (t  +  r)  for  three  predictors:  a)  linear 
extrapolation  from  x  (t)  ,x  (t  —  t),  whose  rms  error  is  \J2 r 
times  Allan  deviation  for  r;  b)  optimal  prediction  based  on 
the  entire  discrete  past  x  (t  —  10?r),  n  =  0, 1, . . .,  calculated  in 
closed  form  by  solving  for  the  stationary  covariance  matrix  of 
a  Kalman  filter  [7];  c)  optimal  prediction  based  on  x(t  —  10 n), 
n  =  0, . . . ,  5,  calculated  by  the  method  given  here. 

4.  Model:  flicker  FM,  h-\  =  1,  d  =  2.  For  t *  =  8,  the 
squares  in  figure  3  show  the  BLIE  regression  coefficients  for 
the  33-point  prediction  set  T  =  {— 32,  — 31, ...,  0}.  Because 


Fig.  2.  Normalized  rms  error  of  predictors  of  a  sum  of  white  FM  and  random 
walk  FM. 


the  coefficients  are  small  off  the  set  {— 32,  — 31,  — 1, 0},  we 
also  try  the  latter  set  for  prediction  (filled  circles).  Figure 
4  shows  VMSE/r  for  both  T  sets  and  for  t*  =  t  = 
1, 2, 4, ... ,  256.  The  lower  horizontal  line  is  for  optimal  pre¬ 
diction  from  the  continuous  past,  t  <  0.  The  upper  horizontal 
line  is  for  simple  linear  extrapolation  from  T  =  {— r,  0}  as 
before.  We  observe  only  a  small  error  penalty  for  using  the  4- 
point  set  instead  of  the  3  3 -point  set.  As  r  becomes  large,  the 
linear  extrapolator  becomes  better  than  the  other  predictors, 
which  see  only  32  units  into  the  past  of  this  long-memory 
process. 
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Fig.  3.  Regression  coefficients  for  predicting  flicker  FM 


IV.  Trend  estimation 


Let  x  ( t )  have  stationary  dth  increments  (d  >  0),  with  an 
unknown  trend  coefficient  crj.  Then 

td 

x  (t)  =  Cd-jj  +  x°  (i) ,  (26) 


3The  author  does  not  know  any  conditions  for  R  to  be  nonsingular.  where  xo  ( t )  has  trend  coefficient  zero.  Again  there  will 

4Copyright  by  The  MathWorks,  Inc.  be  invariance  and  orthogonality  conditions  to  determine  the 
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prediction  interval, r 


Fig.  4.  Normalized  rms  errors  of  predictors  of  flicker  FM 


optimal  estimator  of  Cd  having  the  form 


cd  =  Ya(t)x  (t)  ■ 

teT 

(27) 

The  invariance  condition  is  determined  by  the  requirement  that 
the  estimation  error  Cd  —  Cd  be  unchanged  if  a  polynomial 
of  degree  <  d  is  added  to  x  (£).  Thus  the  error  is  invariant 
to  the  trend  cdtd/d\  itself  as  well  as  to  polynomials  p(t )  of 
degree  <  d  —  1  added  to  x  (£).  After  replacing  (26)  by  x  (f)  = 
Cdtd /d\  +  xq  (t)  +  p  (t),  we  obtain 

Cd~  cd  =  cd  1  -  ^  a  {t )  td 

’  teT 

(O+rO)]- 

teT 

(28) 

If  Cd  —  Cd  is  invariant  to  Cd  and  p(t),  then 

^a(£)fJ=0,  j  =  0, . . .  ,d  —  1, 

teT 

(29) 

^3 

II 

"13 

(30) 

and  the  estimator  gives  the  right  answer  Cd  if  x  (t)  is  a 
polynomial  of  degree  <  d.  A  random  variable  of  form  (27)  is 
said  to  be  a  LIE  of  Cd  if  (29)  and  (30)  hold.  We  write  this  in 
the  matrix  form  Ga  =  g  again,  but  with  different  definitions 
of  G  and  g : 

'1  •••  r 

*  *  *  tn 

G=  .  .  ,  (31) 

j-d  +d 

_ll  '  '  '  ln_ 

g=[ 0  •••  0  rf!]T.  (32) 

Assume  that  T  has  at  least  d  +  I  points.  Then  a  LIE  of 
Cd  exists,  namely,  the  coefficient  of  td/d\  in  the  interpolating 


polynomial  determined  by  x  (ti) , . . . ,  x  (td+ 1).  Moreover,  the 
difference  of  any  two  LIEs  is  in  the  subspace  Md+\  (; x ,  T), 
the  set  of  b  M  x  (t)  such  that  b  (t)  satisfies  the  moment 
condition  of  order  d  +  1.  As  with  the  prediction  problem,  a 
LIE  Cd  is  the  BLIE  if  it  satisfies  the  orthogonality  condition 

cd-cd±-  Md+ i  (. x ,  T) .  (33) 

By  (28),  (29),  and  (30), 

Cd-cd  =  -Ya  (t)  x0  (t) ,  (34) 

teT 

which  belongs  to  Md{xo)  by  (29).  Also,  Md+i(x,T)  = 
Md+\  (xo,  T)  C  Md  (xo),  because  every  member  of  Md+i  (x) 

has  coefficients  that  kill  the  trend.  Therefore,  (33)  can  be 
written  as 

E  [— aTx0  (T)]  [x0  (T)t  hj  =  0  whenever  Gb  =  0,  (35) 

where  xo  (T)  is  a  column  vector  like  x(T).  Because  both 
factors  in  the  left  side  of  (35)  belong  to  Md  (x0),  we  may  use 
Theorem  1  for  xo  (t)  to  evaluate  this  expression,  giving  the 
condition 

—arRb  =  0  whenever  Gb  =  0,  (36) 

where  R  is  defined  by  (21)  with  sx  (t)  replaced  by  sXo  { t ). 
By  the  same  argument  as  before,  —Ra  =  GT9  for  some 
(d  +  l)-vector  9  =  [90  ■  ■  ■  9d]  .  We  arrive  at  the  system 

of  equations 

Ga  =  5,  Ra  +  GT6  =  0,  (37) 

which  is  like  (22),  except  that  now  there  are  n+d+1  equations 
and  unknowns,  the  definitions  of  G  and  g  are  different,  and  r 
has  become  0.  The  same  solution  methods  are  available. 

By  (33),  Cd  is  unbiased  for  cd  because  all  the  members  of 
Md+i  (x)  have  mean  zero.  By  (34),  Theorem  1,  and  (37),  we 
can  calculate  the  MSE  of  cd  by 

MSE  =  E  {cd  —  cdf  =  E  [— aTx0  (T)]  [-x0  (T)t  a] 

=  aTRa  =  —gT9  =  -d\6d.  (38) 

A.  Examples 

1.  Model:  x(t)  =  white  FM  +  cif  (unknown  frequency), 
ho  =  1,  d  =  1,  T  =  {0, 1, 10}.  Result:  c\  = 
i[x(10)-x(0)],  MSE= 

2.  Model:  x  (t)  =  x0  (f)  +  \ c2t 2  where  x0  (t)  is  white 
FM,  flicker  FM,  or  random  walk  FM,  T  =  {0, 1, 10}.  The 
regression  coefficients  for  C2  are  shown  in  Fig.  5.  Even  though 
white  FM  is  in  SI(1),  we  have  to  treat  it  as  a  member  of  SI(2) 
to  extract  a  quadratic  trend,  independently  of  any  linear  trend 
that  may  also  be  present.  For  white  FM,  c2  is  also  the  slope 
of  the  least-squares  linear  fit  to  the  frequency  data. 
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Fig.  5.  Regression  coefficients  for  estimating  quadratic  drift  rate  in  the 
presence  of  white  FM,  flicker  FM,  or  random  walk  FM. 


V.  Additional  results 

A.  Prediction  from  equally  spaced  data 

For  fixed  d,  it  takes  O  (n3)  operations  to  solve  the  Yule- 
Walker  equations  (23)  (when  d  =  0)  or  their  generalization 
(22)  by  general  linear  equation  solving  methods,  where  n  is 
the  number  of  elements  in  T.  If  T  is  an  equally  spaced  set 
of  times,  however,  then  R  is  a  Toeplitz  matrix.  For  stationary 
x  ( t ),  the  Levinson-Durbin  algorithm  [15],  which  is  a  loop  on 
n,  calculates  the  regression  coefficients  and  the  MSE  in  O  ( 'n 2) 
operations.  The  author  has  been  able  to  extend  this  algorithm 
to  the  case  d  >  1  while  keeping  the  O  (n2)  property  for  fixed 
d.  In  cases  that  have  been  tried,  the  results  of  the  two  general 
algorithms  (7?_1  and  brute  force)  and  the  extended  Levinson 
algorithm  agree  within  roundoff  error. 
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B.  Trend  from  symmetric  data 

Suppose  that  T  is  symmetric  about  some  point,  which  we 
assume  to  be  zero;  then  —  T  =  T.  It  can  be  shown  that  the 
optimal  trend  coefficient  estimator  dd  has  coefficients  that  are 
even  (a  (—t)  =  a  (t))  if  d  is  even,  and  odd  (a  (— t)  =  —a  (t)) 
if  d  is  odd.  In  either  case  one  can  set  up  equations  like  (37) 
for  a(t),  t  >  0,  and  a  smaller  auxiliary  vector  0.  Thus,  the 
dimension  n  +  d  +  1  of  the  system  (37)  can  be  reduced  by 
approximately  a  factor  of  two. 


This  work  was  carried  out  at  the  Jet  Propulsion  Laboratory, 
California  Institute  of  Technology,  under  a  contract  with  the 
National  Aeronautics  and  Space  Administration. 
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